Relational objects for the optimized management of fixed-content storage systems

ABSTRACT

A system and method is described for managing data objects in a fixed-content storage system. Metadata is provided for each variable size packet and may include offset information, packet size data, reference content blocks, and the like. Using this information, intelligently decomposed objects, consolidated objects, differenced objects, and composite objects may be stored in the storage system. The data structure provided by these objects allows for the reduction of necessary storage resources and the total number of stored objects.

TECHNICAL FIELD

The present invention relates to fixed-content storage systems. Inparticular, the present invention relates to managing data objects in afixed-content storage system.

BACKGROUND

A fixed-content object is a container of digital information that, oncecreated, remains fixed. Examples of objects that could be fixed includemedical images, PDF documents, photographs, document images, staticdocuments, financial records, e-mail, audio, and video. Altering afixed-content object results in the creation of a new fixed-contentobject. A fixed-content object once stored becomes immutable.

Fixed-content digital data is often subject to regulatory requirementsfor availability, confidentiality, integrity, and retention over aperiod of many years. As such, fixed-content data stores grow withoutbounds and storage of these digital assets over long periods of timepresents significant logistical and economic challenges.

To address the economic and logistical challenges associated withstoring an ever growing volume of information for long periods of time,fixed-content storage systems implement a multi-tier storage hierarchyand apply Information Lifecycle Management (ILM) policies that determinethe number of copies of each object, the location of each object, andthe storage tier for each object. These policies will vary based on thecontent of each object, age of each object, and the relevance of theobject to the business processes.

A multi-site, multi-tier storage system, large scale distributedfixed-content storage is needed, for example, to address the requirementfor storing multiple billions of fixed-content data objects. Thesesystems ensure the integrity, availability, and authenticity of storedobjects while ensuring the enforcement of Information LifecycleManagement and regulatory policies. Examples of regulatory policiesinclude retention times and version control.

SUMMARY

Fixed-content storage systems grow as new objects are stored. Thisgrowth is accelerated by providing redundant copies of fixed-contentobjects in order to reduce the probability of data loss. As the size andcomplexity of the fixed-content storage system grow, the resourcesnecessary to manage the storage system also increase. Improved datamanagement techniques are therefore needed as the system scales to moreefficiently store, organize, and manage data in a fixed-content storagesystem, while also fulfilling applicable regulations.

In one embodiment, a data object to be stored in a distributedfixed-content storage system is intelligently decomposed along the dataobject's logical boundaries. Intelligently decomposed objects arecompared with other reference objects and, where they are identical, onereference object is stored and referenced by a reference content block.For example, a medical study archive contains thousands of instances ofa template form with minor variations. For each instance, the templateis stored separately from the additional data. Intelligent decompositionof the template data and the additional data when storing the archiveallows for one instance of the template data to be referenced by otherobjects containing reference content blocks. Thus, storage resources maybe used efficiently where identical data is stored in only as manyplaces as required by regulatory or other requirements.

In another embodiment, multiple external data objects are consolidatedinto a single data object. The external data objects are accessed byreference to metadata that indicates an offset and size of the externaldata object. By consolidating many objects into a single object, thetotal number of data objects is reduced. This allows for the simplifiedmanagement of the data stored in the fixed-content storage system.

In another embodiment, differenced objects are created when an objectstored in a fixed-content storage system is edited. The edits to theoriginal object may represent a small change in the original object, butbecause the stored original object is immutable it is not possible tosimply overwrite the small portion that is edited. In order to store theedited data without requiring duplication of existing data, a new objectis created that references both the original object and the edited data.The metadata of the new object includes information relating to theoffset and the size of the edited data so that the edited data isaccessed instead of the corresponding portion of the original object.

In yet another embodiment, composite objects are provided that referencemultiple objects. A manifest data object is created that references eachobject, and accessing the manifest data object allows for theidentification, access, and management of objects joined in thecomposite object.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates various nodes in a distributed storage system.

FIG. 2 illustrates an embodiment of a fixed-content storage subsystemthat comprises multiple data objects.

FIGS. 2A-E illustrate a method of intelligent decomposition and storageof content.

FIGS. 3A-C illustrate a method of object consolidation and storage ofcontent.

FIGS. 4A-C illustrate a method of storing content as a differencedobject.

FIGS. 5A-C illustrate a method of storing content as a composite object.

FIG. 6 illustrates a composite object utilizing various storage methods.

DETAILED DESCRIPTION

Continued adoption of digital technology in nearly all sectors includinghealthcare, media, government, and financial services is acceleratingthe creation of fixed-content data. Regulatory and business requirementsfor retention are resulting in the continued growth of data that must bestored and managed. In many sectors, the retention times exceed thepractical lifetime of the storage media, and long term data archiving isan ongoing business challenge. As the archives grow, scaling limitationsarise due to the size of the stored data as well as the number of fixedcontent objects that need to be stored and managed. There is a marketdemand for fixed-content storage systems that can intelligently managefixed-content data to provide for more efficient scaling.

Fixed-content storage involves the storage and management of data suchthat once stored, the data is immutable—it cannot be changed. Thus,locks are not required for alterations to the contents of the object.However, despite the object itself being immutable, additional objectsmay be stored that consist of minor variations of an existing object andmany objects may have large amounts of identical data. Efficiency isprovided according to certain embodiments by recognizing where theseminor variations and duplicate data exist. Rather than providing morecopies of any particular data than necessary, metadata is configured toprovide references to data objects containing the data. Additionally,object management may be simplified by reducing the total number ofobjects or providing a single object that allows access to andmanagement of additional objects.

Storage Grid Overview

As illustrated in FIG. 1, a typical fixed-content storage systemdeployment may involve multiple nodes, often spanning multiplegeographically separated sites. When a request for information is made,the storage grid 200 may serve that request based on the location of thedata, the location of the user, the load on the system, and the state ofthe network. This balances the load on the network, storage and serversin order to minimize bandwidth usage and increase performance. Thestorage grid 200 is a unified structure, but there may be multipleservers or repositories of content or metadata.

Nodes may be grouped based on the services they provide. For example,storage nodes 232, 236 may provide for secure data storage andtransmission. A storage node may consist of a service running on acomputing resource that manages storage and archival media such as aspinning media resource or tape.

The storage resource 224, 242 on a storage node can be based on anystorage technology, such as RAID, NAS, SAN, or JBOD. Furthermore, thisresource may be based on any grade of disk such as a high performancefiber channel or ATA disk. Storage nodes may be linked together over,for example, LAN and WAN network links of differing bandwidth.

Storage nodes can accept data and process retrieval requests, andinformation input into a storage node can be retrieved from otherstorage nodes. Storage nodes may process client protocol requests andinclude support for DICOM, HTTP and RTP/RTSP. Support for NFS/CIFS maybe provided, for example, through gateway nodes.

Storage nodes may replicate and cache data across multiple sites andmultiple nodes. Data replication is based on a set of configurable rulesthat are applied to the object metadata and may take into accountgeographic separation of nodes as well as the bandwidth between nodes.The logic that governs replication and distribution may be enforced bycontrol nodes.

Gateway nodes 228 provide an interface through which externalapplications 220 may communicate with the storage grid. Gateway nodes228 route incoming requests to storage nodes based on, for example, theavailable CPU, bandwidth, storage and geographic proximately. Forapplications that require direct file system access, the gateway nodes228 may provide a NFS/CIFS interface to the storage grid.

Control nodes 238 may consist of separate software services, such as theContent Metadata Service (CMS) and the Administrative Domain Controller(ADC). Although these services can run on separate computing resources,they may also share a single server. The Content Metadata Serviceconstitutes a distributed business rules engine that provides forcontent metadata storage, metadata synchronization, metadata query andenforcement of replication and information lifecycle management businesslogic. Replication and information lifecycle management policies may bebased on metadata that is associated with stored objects. This allowsthe creation of business rules that determine where content is stored,how many copies are stored, and on what media it is stored on throughoutits lifecycle. A Content Metadata Service may interface, for example,with a local SQL database through a database abstraction layer.

The Administrative Domain Controller acts as a trusted authenticationrepository for node-to-node communication. It also provides knowledge ofsystem topology and information to optimize real-time usage ofbandwidth, CPU and storage resources. This allows automated managementof computational resources and dynamic load balancing of requests basedon the available CPU, storage and bandwidth resources.

The Administration Node 234 may consist of software components such asthe Network Management Service and the Audit Service. These services mayshare a common computing resource, or they may be run on separatecomputing resources. A management interface 226 may be used to monitorand manage the operational status of the grid and associated services.

The Audit Service provides for the secure and reliable delivery andstorage of audited events corresponding to content transactions acrossthe entire storage grid. Audit events are generated, in real-time, byStorage Nodes and Control Nodes. Events are then relayed through thestorage grid using a reliable transport mechanism and delivered to theAdministration Nodes. Audit messages are processed by the Audit Serviceand may be directed to an external database or file.

The Network Management Service collects and processes real-time metricson utilization of computing, storage and bandwidth resources. Itprovides real-time and historical usage reports. In addition it isresponsible for fault reporting and configuration management.

The Archive Node 230, 240 may manage a locally attached tape drive orlibrary 246 for the archiving and retrieval of grid managed objects.Archive nodes may be added to diversify archive pools and to providearchival storage at multiple sites. The storage grid 200 may alsoutilize external storage resources, such as a managed tape library 222or an enterprise SAN 224.

Storage Nodes and Control Nodes in the storage grid can be upgraded,decommissioned, replaced or temporarily disconnected without anydisruption. Nodes do not need to run on the same hardware or have thesame storage capacity. Nodes replicate and cache data across multiplesites and multiple nodes. In addition to bandwidth savings, theintelligent distribution of information provides for real-time backup,automated disaster recovery and increased reliability.

Capacity, performance and geographic footprint of the storage grid canbe increased by adding nodes as needed, when needed, without impactingend-users. This enables the storage grid to accommodate thousands ofterabytes of data across hundreds of locations. The storage gridcombines the power of multiple computers to achieve extremely highlevels of scalability and throughput. As nodes are added to the storagegrid, they contribute to the available computational and storageresources. These resources are seamlessly utilized based on bandwidthavailability and geographical suitability.

In traditional archives, information is stored as files, and access todata is gained through a path pointer stored in an external database.When storage scales, old storage is replaced, or is offline, thisresults in broken pointers and unavailable data. In order to scale,costly and disruptive migration procedures are required. Furthermore, itis difficult to operate in heterogeneous environments and multi-sitedeployments. This is because the approach relies on the underlying filesystem and network file system protocols.

Within the storage grid, data are stored and referenced as objects. Anobject can be one file or a collection of files with relationships thatare defined by object metadata. Object metadata constitutes applicationspecific information that is associated with a data object. Thisinformation can be attached to or extracted from the object at the timeof input into the storage grid. Object metadata can be queried and thestorage grid can enforce business rules based on this information. Thisallows for efficient utilization of storage/bandwidth resources, andenforcement of storage management policies.

In this object oriented architecture, external applications no longeruse pointers to a path, but a universal handle to an object. Thisenables high levels of reliability, scalability and efficient datamanagement without the need for disruptive migration processes. Multipleobject classes can be defined and for each object class, there arespecific business rules that determine the storage management strategy.

In this embodiment, the storage grid is fault tolerant, resilient andself-healing. Transactions continue to be processed even after multiplehardware, storage and network failures. The design philosophy is thathardware, network, and catastrophic failures will occur, and the systemshould be able to deal with faults in an automated manner withoutimpacting the stored data or end-users.

Reliability is achieved through replicas, which are identical copies ofobjects (both data and metadata) that are stored on multiple nodes andkept synchronized. Increasing reliability involves adding nodes to thestorage grid and increasing the number of replicas for each object. Thelocation and number of the replicas is based on a set of rules that canbe configured to ensure geographical separation and the desired level ofredundancy. The storage grid will automatically enforce this logicacross all nodes. If a failure is detected, the system is self-healingin that additional replicas are automatically created to restore thelevel of resiliency.

As nodes are added, removed or replaced, the system manages theavailable storage. Incoming data is transparently re-directed to thetake advantage of the newly added storage capacity. Within the storagegrid objects are redistributed, purged, or replicated based on metadataand policies that are applied to the metadata. Objects can also migratefrom one storage grade (e.g., disk) to another (e.g., tape) not simplybased on time and date stamps, but external metadata that indicates theimportance of the object to the specific business application. Forexample in medical applications, certain imaging exams may beimmediately committed to deep storage. In applications for the financialsector, retention policies may be set up to facilitate compliance withregulatory requirements for data retention.

Users may input and retrieve data from the location within the storagegrid that is closest to them, thereby efficiently utilizing bandwidthand reducing latency. In addition, as information is requested, it maybe cached at the requesting Storage Node to enable improved bandwidthefficiency.

Obsolete components can be removed without impacting services orendangering stability and reliability. A Storage Node may bedecommissioned through the administrative console. When this takesplace, the storage grid may automatically redirect requests to alternatenodes. Furthermore, the storage grid may transparently re-distribute thestored data on other suitable Storage Nodes. This allows for seamlessremoval of obsolete hardware without any disruptions to storage gridoperations. This is in contrast to disruptive data migration proceduresthat are common in many fixed content applications. Operators caneliminate support for obsolete hardware while taking advantage of theeconomic benefits of decreasing costs of storage and increases inprocessing power. Each newly added node costs less and provides moreprocessing power and storage capacity.

When data and metadata are stored into the storage grid, the data andmetadata is packaged into an object. Objects consist of data andassociated metadata that are managed as an unalterable and atomicentity. Once stored, these objects are actively managed throughout theirinformation lifecycle. When an object is retrieved, the original dataand associated metadata is presented for use. This provides atransparent storage service to external entities.

Each object stored may have a unique identifier that acts as the primaryidentifier for the object. This identifier may be assigned at the timethe object is created. Objects can be moved from one object store toanother.

Objects stored within the grid may contain metadata, which is used tomanage the objects over their lifecycle and facilitate access to theobjects. Object metadata may include, for example, Content Blockmetadata, Protocol metadata, Content metadata, User metadata, orManagement metadata.

Content Block metadata may be metadata associated with the objectcreation process itself, and provides information about the packagingand protection of the user provided data and metadata. An example ofthis type of metadata is the size of the data stored in a given object.

Protocol metadata may be metadata associated with the protocol used tostore the object, but not intrinsic to the data within the object. Thisincludes metadata required to perform protocol specific transactions.For data stored through the DICOM protocol, an example of this type ofmetadata is the DICOM AE title of the entity that stored the data.

Content metadata may include metadata contained within recognized typesof content. If so processed, metadata specific to each recognized typeof content is extracted from the content. For content of type PDF, anexample of this type of metadata is the number of pages in a document.

User metadata may include arbitrary metadata specified by the entitystoring content into the grid. This ability to attach user metadata islimited by the protocol used to store the objects. An example of thistype of metadata is a private identifier assigned by the user.

Management metadata consists of metadata generated and modified overtime as objects are managed within the grid. Unlike the previous fourclasses of metadata, this metadata is not immutable, and is not presentas part of the object itself. An example of this type of metadata is thetime when an object was last accessed.

Each time a new object is stored, the metadata associated with theobject is also stored in a separate subsystem that maintains arepository of metadata. The metadata store can be queried to return themetadata associated with a given object. Queries can also be performedto return a list of objects and requested metadata for all objects thathave metadata that matches a specific query.

Placement of objects may be based on the capabilities of the storagegrid computing resources. Different computing resources have differentcapacity to perform work. While this is primarily measured based on theclock frequency of the processor, the number of processors and relativeefficiencies of different processor families may also be taken intoaccount. In addition, the amount of CPU resources that are currently inuse provides a mechanism to determine how “busy” a given resource is.These characteristics are monitored and measured to allow decisions tobe made within the grid about which computing resource is best suited touse to perform a given task.

Placement of objects may also be based on the characteristics of thestorage resources, such as storage latency, reliability, and cost.Storage capacity provides information for calculating risk in the eventof rebuild. A measurement of the amount of storage capacity that iscurrently in use provides a mechanism to determine how full a givenstorage resource is, and determine which locations are more able tohandle the storage or migration of new content. Different storageresources have different throughput. For example, high performanceFiber-Channel RAID systems will deliver better performance then a lowerperformance software RAID on IDE drives. A measurement of the amount ofI/O bandwidth that is currently in use provides a mechanism to determinethe extent to which a given storage resource is able to handleadditional transactions, and how much it will slow down currenttransactions. Storage resources can be read-only, and thus not acandidate for the storage of new objects. These characteristics may bemonitored and measured to allow decisions to be made within the gridabout which storage resource is best suited to use to retain objectsover time, and influence the rules that determine where objects shouldbe stored.

Placement of objects may also consider the characteristics of networkpaths, such as latency, reliability and cost. Different network pathshave different amounts of bandwidth available. This directly maps intothe time required to transfer objects from one storage repository toanother. The amount of the network bandwidth that is currently in usemay also be considered. This provides a mechanism to determine how“busy” a given network link is, and to compare the expected performanceas compared to the theoretical performance. These characteristics may bemonitored and measured to allow decisions to be made within the gridabout which network path is best suited to use to transfer objectsthrough the grid.

When objects are stored in multiple different locations, the probabilityof data loss is reduced. By taking common-mode failure relationships andfault probability information into account, the probability of data lossand data inaccessibility for a given placement of objects can bequantified and reduced to manageable levels based on the value of thedata in question.

To avoid common mode failures, replicas of objects can be placed inseparate failure zones. For example, two replicas created within asingle server room can take into account that storage on nodes that donot share a single UPS has a higher probability of accessibility thentwo replicas stored on two nodes that share the same UPS. On a largerscale, two replicas created in geographically distant locations have alower probability of loss then two nodes within the same facility.

As replica placement rules are metadata driven, they can be influencedby external systems and can change over time. Changes to existingreplicas and changes to the topology of the grid can also influencereplica placement rules.

Replica placement can reflect the instantaneous, historical andpredictive information associated with a given resource. For example,monitoring of server and storage health can dynamically influence thedegree of reliability attributed to a given resource. Different types ofstorage resources, such as IDE vs. SCSI, have different reliabilitycharacteristics. In addition, archival and offline storage often have adistinct media lifetime, which need to be managed to preserve archiveintegrity. These are both examples of the use of information aboutavailable resources is used to determine the best solution for a givenset of constraints.

Implementation of configuration information based on formal riskanalysis can further optimize the resource tradeoff by providinginformation about common mode failures that cannot be automaticallydiscovered by the grid. For example, the placement of two replicas onnodes situated along the same fault line may be considered to be withina common failure mode, and thus suboptimal when compared to theplacement of one of the replica in a facility not located on the fault.

The use of external data feeds can provide valuable information aboutchanges in the reliability of a given failure zone. In one scenario, alive feed from the weather monitoring system can provide advance noticeof extreme weather events, which could allow the grid to dynamicallyrebalance content to reduce the risks associated with the loss ofconnectivity to a given facility.

Content stored in a fixed-content storage system can be, but is notlimited to, audio, video, data, graphics, text and multimediainformation. The content is preferably transmitted via a distributionsystem which can be a communications network including, but not limitedto, direct network connections, server-based environments, telephonenetworks, the Internet, intranets, local area networks (LAN), wide areanetworks (WAN), the WWW or other webs, transfers of content via storagedevices, coaxial cable, power distribution lines (e.g., eitherresidential or commercial power lines), fiber optics, among other paths(e.g., physical paths and wireless paths). For example, content can besent via satellite or other wireless path, as well as wirelinecommunications networks, or on the same path as a unit of power providedby a utility company.

Reference Blocks

According to some embodiments, novel data structures are utilized inorder to allow certain features described herein. Objects stored withinthe storage system are stored as one or more packets. Each packetincludes a certain non-zero amount of packet metadata and zero or morebytes of payload data. In a preferred embodiment, the quantity of packetmetadata and the quantity of payload data vary among different packets.A maximum packet size or quantity of payload data may be utilized. Forexample, the maximum quantity of payload data in a variable size packetmay be configured to be 16 KB. Each packet may include a predeterminedidentical amount of packet metadata and payload data in someembodiments.

The packet metadata may contain information allowing for the processingof variable sized packets when the amount of packet metadata and payloaddata is not predefined. Types of packet metadata include offset data,packet size data, and the like. This packet metadata may allow for thearbitrary retrieval of data in an object by identifying a specificpacket or bytes within or across one or more packets.

FIG. 2 shows an embodiment of a fixed-content storage subsystem 700 thatcomprises multiple data objects. The data objects comprise metadata 701and payload data 702. Furthermore, the fixed-content storage system 700is accessible by a remote server 720.

As shown in FIG. 2, one or more packets may comprise reference contentblocks 710 and/or floating reference content blocks 705 according tosome embodiments. A reference content block 710 preferably has onlypacket metadata that refers to a different packet or content block, anddoes not contain any payload data. The packet metadata reference maycause an application accessing the reference content block to accesssome other packet(s) in place of the reference content block. Forexample, with a video file stored in a fixed-content storage system, areference content block may be stored rather than another short video(such as a geographically specific clip). The reference content blockmay refer to that short clip stored separately, either in thefixed-content system or in another storage system.

A floating reference content block 705 is a reference content block thatdoes not yet point to a packet or reference content block. Unlikereference content blocks 710, which are resolved at the storage system700 (for example, by referring to a logical or physical memory address,or by referring to a particular object or instance), floating referencecontent blocks 705 are resolved at a server 720 or computing systemoutside the fixed-content storage system when the data is accessed. Thepacket metadata associated with the floating reference content block 705specifies the size, duration, and/or other information that enables theserver 720 to resolve the floating reference content block 705.Accordingly, an object comprising one or more packets may referenceother objects or portions of other objects within the storage system700. According to some embodiments and as shown in FIG. 2, a server 720resolving a floating reference content block 705 may also resolve thestorage location to an external storage system 730.

With floating reference content blocks, an object may reference variabledata within the storage system. Though the data written to thefixed-content storage system 700 is not altered, floating referencecontent blocks 705 allow for the modification of an object as seen by anexternal user accessing the storage system 700. Floating referencecontent blocks may therefore be a powerful tool when used with afixed-content storage system as described herein.

For example, if a medical report/form template is stored in afixed-content storage system, there may be a number of blank fields. Foreach patient having a report stored, the values of these fields may bedifferent, but the template is largely the same. If these fields arestored as floating reference content blocks, then the patient data maybe stored separately for each patient, without duplicating the templatedata. When the data is accessed, for example by a medical professional,they may request information on one of the patients. The template wouldbe loaded, and based on the patient information requested, the medicalprofessional's computing system can resolve the floating referencecontent blocks in order to access the specific patient data requestedalong with the report form.

Floating reference content blocks may be resolved according to anycriteria appropriate to the particular file. For example, a floatingreference content block may be resolved based on the geographic locationof the computing system accessing the data, an IP address, datasubmitted by the computing system, or the like.

The metadata in a reference content block or a floating referencecontent block can override some of the metadata in a packet (or group ofpackets) that is pointed to. This may allow certain data stored in thefixed-content storage system to be treated differently according to howit is accessed. This in turn may allow for objects to be stored oncerather than requiring near identical copies, as the data is immutable.By changing the management rules of the fixed-content storage system,more flexibility is obtained without modifying the protected data.Several embodiments of operations performed using reference contentblocks and floating reference content blocks will be described in moredetail below.

Intelligent Decomposition

FIGS. 2A-E demonstrate a method for intelligently decomposing datastored in a fixed-content storage system according to one embodiment.Intelligent decomposition stores data objects according to their logicalboundaries and allows for single instance storage of objects or portionsof objects that may be identical. For example, in some systems multipleinstances of similar data are stored, where the difference is thepayload within a well-known structure, such as a TAR archive. A TARarchive is the concatenation of one or more files.

FIG. 2A shows one embodiment of an implementation of intelligentdecomposition data management techniques with reference to a TAR archive10 for a medical system storing, for example, cardiology and radiologyimages. Other embodiments utilize other data file types having knownboundaries. The TAR archive 10 includes two archived files 12, 14. Eacharchived file 12, 14 is preceded by a header block 16, 18. The archivedfile data is written unaltered except that its length is rounded up to amultiple of 512 bytes and the extra space is zero filled. The TARheaders 16, 18 may comprise 512 byte blocks of data indicating the sizeof each data file, the owner and group ID, the last modification time,and other data.

As discussed previously, objects such as a TAR archive may be stored inone or more packets. For example, FIG. 2B illustrates partitioning ofthe TAR archive 10 into five packets 20, 22, 24, 26, 28. Thepartitioning of the packets 20, 22, 24, 26, 28 was done without regardfor the file boundaries within the TAR archive. Accordingly, the packets20, 22, 24, 26, 28 contain data from various sources that may not belogically related. For example, the packet 24 contains datacorresponding to file 12, header block 18, and file 14. There is noalignment of the TAR headers, and no references to data in externalobjects.

FIG. 2C illustrates the partitioning of the TAR archive 10 by using thefile boundaries and the alignment of TAR headers. TAR header 16 isplaced in packet 30, archived file 12 is placed in packets 32, 34, TARheader 18 is placed in packet 36, and archived file 14 is placed inpackets 38, 40. Because the TAR archive 10 was partitioned along the TARarchive header and file boundaries, each of the TAR archive headers andfiles can be handled separately.

FIG. 2D illustrates an exemplary embodiment for storing the partitionsfrom FIG. 2C as multiple objects. A master object 42 corresponds to theTAR archive 10. The master object 42 includes a component for each ofthe two files in the TAR archive. The first component includes metadatapacket 25A, TAR file header packet 30 (from FIG. 2C), and referenceblock 27A. The second component includes metadata packet 25B, TAR fileheader packet 36 (from FIG. 2C), and reference block 27B.

Reference block 27A provides a reference to a reference object 46.Reference object 46 includes partitions 32, 34 corresponding to thefirst file 12 in the TAR archive 10, and packet metadata 25D and 25E.Reference block 27B provides a reference to a reference object 48.Reference object 48 includes partitions 38, 40 corresponding to thesecond file 14 in the TAR archive 10, and packet metadata 25F and 25G.Thus, each archived file 12, 14 is stored as a unique object andreferenced by a master object.

FIG. 2D also includes a second master object 44. Master object 44includes a packet 31 corresponding to a third header. In this example,the third header is found in a TAR archive that also contains the firstdata file 12. Rather than storing an additional reference objectrepresenting a duplicate copy of the reference object 46, the referencecontent block 27C references the existing stored reference object 46. Byreducing the required storage of duplicate objects, the total amount ofstorage resources required by the fixed-content storage subsystem may bereduced.

Although the example shown in FIGS. 2A-2D relates to a TAR file, asimilar procedure could be applied to other file types. In one example,a media file may contain a series of media clips, and each media clipcould be treated as an object. In another example, a pdf file maycontain pages or other content that could be treated as separateobjects.

One embodiment of a process for intelligently decomposing objects storedto a fixed-content storage system is shown in FIG. 2E. The processbegins at state 201 where an object to be stored is received. The objectreceived is preferably of a type having a well known file structure sothat it can be decomposed or packetized at state 202 along its logicalboundaries. For example, header data may be separated from payload data.

The decomposed object is thus broken into separate portions, each ofwhich may comprise one or more packets. One of the portions is selectedat state 203, and at decision state 204 it is determined if the selectedportion is identical to an existing stored reference object. Theexisting object may comprise any other object, but is likely to be areference object related to the current object being stored. Forexample, if the current object being stored is an instance of a medicalstudy, then existing instances of the study may be identified based onmetadata or additional data from the external system providing theobject. If the portion already exists as a reference object, then theexisting object is referenced by a reference content block at state 205.If the portion does not already exist in the storage system, then thedecomposed object portion is stored at state 206. At decision state 207it is determined whether the entire received object has been stored orreferenced. If any portion remains, then the process returns to state203. When all portions have been handled, then a master object exists inthe storage system for the received object that references existing dataas well as any new data. Thus, this process may advantageously be usedin a fixed-content storage system in order to allow greater flexibilityand reduce the need for increased storage space.

In one embodiment, the decomposed object portion is stored prior toidentification of existing instances of the object. After it isdetermined that equivalent content to the decomposed object portion isstored in another object, the identifier for the decomposed objectportion may be repointed to the other object. The stored decomposedobject portion may then be removed.

Object Consolidation

FIGS. 3A-C show a method of object consolidation for a fixed-contentstorage system. For multiple data objects representing individualinstances of a particular group, it may be inefficient to store eachinstance as a separate object. Even when identical data is handledefficiently, the management of a large number of objects may createinefficiencies in object management.

As an example, a data object representing an advertisement is createdfor distribution and display in a variety of geographical areas. Theadvertisement data object may be configured to reference a large numberof additional data objects (e.g., endings), with each of the additionaldata objects corresponding to one of the geographical areas. Rather thanstoring a separate data object including the advertisement data objectfor each additional data object or storing the advertisement data objectonce and storing each of the additional data objects separately, asingle object may be created with each of the additional data objectsstored back-to-back. When the advertisement object is accessed, afloating reference content block resolves to a different offset based onthe geographic location. Thus, for 200 different regions, rather thanstoring a relatively large advertisement and 200 relatively shortendings as 201 objects, the endings are stored back-to-back so that asingle object is created including the advertisement and all of theendings. The cost of managing many small objects for differentapplications, sometimes having tens of thousands or more individualinstances, can be quite large. Storing the small objects as a singleobject allows for random access retrieval while reducing the number ofobjects required, thus making storage management more cost effective.

As another example, a data object representing a medical study mayinclude thousands of individual cases or instances. The cost of managingmany small objects can be large from a licensing or hardware standpoint.Consolidating the cases or instances reduces the number of objectsrequired. The individual cases or instances would still be accessibleusing offsets for random-access.

FIG. 3A shows an example of object consolidation of two external dataobjects 51 and 52 according to one embodiment. The external data objects51 and 52 may be any type of data object, such as media files, medicalstorage files, or the like. For example, external data object 51 mayrepresent a first file of a medical study to be stored, and externaldata object 52 may represent an additional instance of the study. Inanother embodiment, the external data objects 51 and 52 are files thatwere originally stored in the same folder.

Rather than store external data objects 51 and 52 as separate objects,they may be stored as a single consolidated data object 50 as shown inFIG. 3B. Data object 50 comprises metadata 54, 55 and external dataobjects 51 and 52. Metadata 54, 55 may indicate, for example, an offsetand size of a particular section of an object. While the example shownin FIGS. 3A and 3B show only two external data objects consolidated toform data object 50, in some embodiments a different number of externaldata objects are consolidated. As the number of external objectsincreases, object consolidation as described herein provides additionalefficiency in managing the objects in a fixed-content storage system.

FIG. 3C shows a process for creating a consolidated data object. Atstate 301 multiple objects are received or accessed. In someembodiments, these objects are accessed and consolidated from within astorage system. In some embodiments, multiple objects are received froman external computing system to be stored, and every object to beconsolidated is received in a single data transfer. In some embodiments,one or more new objects to be consolidated with existing stored data arereceived.

At state 302, metadata is generated for the consolidated object thatindicates an offset and size for the received data objects. For example,the metadata may indicate that a first data object stored in aconsolidated data object may have no offset and be 64 KB, while thesecond data object may have a 64 KB offset and be 32 KB.

At state 303, the multiple received objects are stored back-to-back as asingle object. Any reference to the multiple received objects can behandled by the consolidated object that will reference each of thereceived objects by offset. Accordingly, management of many relatedobjects may be simplified and costs reduced because a smaller number ofobjects are stored in the storage system.

Differenced Objects

Because data in fixed-content storage systems is immutable, smallchanges made to large files may be handled inefficiently by traditionalsystems. For example, a large database containing approximately 50 GB ofdata is stored as an object in a fixed-content storage system. An editto that database is made by a user that comprises approximately 100 KBof changed data. The originally stored object cannot be modified withthese changes in the fixed-content storage system, as the stored datamay not be edited. In traditional fixed-content storage systems, eventhough the vast majority of the data from the original object has notbeen changed, a new object must be stored including the more than 49 GBthat remains identical.

Medical data may include an image and corresponding demographic data.The size of the image is much larger than the corresponding demographicdata. Thus, a 50 MB image may be updated to write 32 bytes worth ofpatient name information.

FIGS. 4A-C show an example of a method for generating and storing adifferenced object in a fixed-content storage system to more efficientlyhandle such changes according to one embodiment. FIG. 4A shows anoriginal data object 60 and an edited data object 65 as stored in atraditional fixed-content storage system. Original object 60 comprisesmetadata 71 and payload data 61A-C. For example, the original dataobject 60 may be a 50 MB radiology image along with a relatively smallamount of associated data 61B that represents patient name, demographicdata, and the like. The associated data 61B may represent, for example32 bytes of 50 MB data object 60. When a change is made to theassociated data 61B, a typical fixed-content system may store the editedobject as a new data object 65 that includes most of the data from theoriginal data object 60, but has replaced the associated data 61B withthe edited data 66.

Rather than storing, as shown in FIG. 4A, the original object 60 and aseparate object 65 containing the entire original object with the editeddata 66, FIG. 4B shows a method for storing a differenced objectincluding essentially only the changes. FIG. 5B shows original object 60comprising packet metadata 71 and payload data 61A-C. An editrepresented by data 66 has again been made to the associated data 61Brepresenting a small portion of the original object 60. A differencedobject 70 is created as the edited object. Differenced object 70comprises reference content block 72A. Reference content block 72Areferences the original object 60 so that the data shared by the editedobject 65 and the original object 60 may be accessed by differencedobject 70 without storing additional copies of the data. Referencecontent block 72A further references an object including metadata 71,edited data 66, and reference content block 72B. The reference contentblock 72A and the reference content block 72B may indicate the locationor offset where associated data 61B of the original object 60 is to bereplaced by edited data 66 when the edited and differenced object 70 isaccessed, the size of the edited data 66, the size of the associateddata 61B, and the like. Referencing the identical data from the originalobject 60 allows original object 60 to be maintained as a fixed-contentobject, while small changes are efficiently stored to create additionalinstances of edited objects.

FIG. 4C is a flowchart indicating one embodiment of a process forgenerating a differenced object. At state 401, an edited object isreceived. Next, at state 402, the edited object is compared to theoriginal object. In the example shown in FIGS. 4A and 4B, associateddata 61B is shown as the payload data from one packet. However, in someembodiments edits may comprise only a portion of the payload data from apacket or may comprise multiple packets or portions thereof.Furthermore, although edited data 66 is shown in FIGS. 4A and 4B ascontaining the same quantity of data as the associated data 61B, thisneed not be the case. In some embodiments, the edited data may containmore or less data than the section of the original object it replaces.

In some embodiments, the fixed-content storage system is configured todetermine whether to store a new object or create a differenced objectbased on the magnitude of the changes to the original object relative tothe object's size. When the changes are larger than a thresholddetermined, for example, based on the size of the original object, theedited object is stored as a new object. When the changes are less thanthe determined threshold, then the edited object may be stored as adifferenced object. For example, the threshold may be that the size ofthe edited data must not be larger than 50% of the size of originalfile.

After the edited portions have been determined (and are determined to besmall relative to the original object in some embodiments), then atstate 403 a reference is stored to the original data object that mayinclude metadata indicating which portions and how much of the originalobject is utilized by the edited object. At state 404, a reference isstored to the edited data. Metadata may also be stored that indicatesthe positioning of the edited data within the original object.

In some embodiments, differenced objects may additionally be ‘flattened’when the original object they reference is no longer necessary. Thereferenced data from the original object may be copied and stored in thedifferenced object with all of the changes, creating a new object. Theoriginal object may then be deleted.

Composite Objects

In order to realize certain advanced applications it may be desirablethat several objects be grouped within a single container as a compositeobject. The objects may therefore be managed according to a single setof rules. For example, a medical study may contain a number of instancesrepresenting, for example, images captured as part of an examination. Auser accessing the stored images may want to retrieve only one image ofmore than 500. If the user were forced to retrieve each image, a greatdeal of time and resources may be wasted. This may be accomplished usingcomposite objects. For medical systems though, this is usually doneusing proprietary container files that are application-specific, oraccomplished by using file-system directories as containers.

FIGS. 5A-C show an example of a method for storing composite objects inan object-oriented fixed-content storage system. FIG. 5A includes dataobjects 80, 85, and 90. In some embodiments, the contents of the dataobjects 80, 85, and 90 are related, but the objects represent differentfile types. In some embodiments, each data object used to form acomposite object is of the same file type.

As shown in the embodiment of FIG. 5B, a manifest data object 100 iscreated in order to simplify the management of data objects 80, 85, and90. Manifest data object 100 includes reference data 101, whichreferences each sub-object 80, 85, and 90 in the composite object 100.In some embodiments, manifest data object 100 is compliant with certainstandards such as XAM so that updated API commands access the manifestobject. If data is changed, only the manifest and changed data need tobe updated. Thus, composite objects described here provide a largedegree of flexibility and increase data management capabilities.

In some embodiments, composite objects may be managed by a single set ofrules, for example stored in the metadata 102 of manifest data object100. In some embodiments, sub-objects referenced by the manifest dataobject 100 include a “managed as” field within the sub-object metadatathat instructs the fixed-content storage system how to manage the givensub-object when it is desired that the object not be managed accordingto the manifest data object 100.

FIG. 5C shows an embodiment of a process for generating a compositeobject. At state 501, multiple objects that are to be related by thecomposite object are received or accessed. In some embodiments, multipleobjects are received from an external computing system to be storedsubstantially simultaneously as a composite object. In some embodiments,multiple objects already stored in the fixed-content storage system areaccessed in order to generate a composite object.

At state 502, a manifest object is generated. At state 503, referencedata indicating the multiple objects received or accessed at state 501is stored in the manifest object.

In a preferred embodiment, the reference data is stored as content data,rather than a metadata reference content block, in order to prevent thealteration of the manifest object in the storage system. In someembodiments, one or more reference content blocks are utilized.

FIG. 6 demonstrates a composite object referencing several data objectsusing many of the data management techniques discussed herein. In theembodiment shown, manifest data object 110 references consolidatedobject 120, differenced object 140, and intelligently decomposed object130. A skilled artisan will realize that these storage managementsystems and methods may be combined in a variety of ways withoutdeparting from the scope of the invention.

The high-level overview illustrated in the figures partitions thefunctionality of the overall system into modules for ease ofexplanation. It is to be understood, however, that one or more modulesmay operate as a single unit. Conversely, a single module may compriseone or more subcomponents that are distributed throughout one or morelocations. Further, the communication between the modules may occur in avariety of ways, such as hardware implementations, softwareimplementation, or a combination of hardware and software. Further, themodules may be realized using state machines, microcode,microprocessors, digital signal processors, or any other appropriatedigital or analog technology.

It should be understood that the methods and systems described hereinmay be implemented in a variety of ways. Methods described herein mayutilize other steps or omit certain steps. Other embodiments that areapparent to those of ordinary skill in the art, including embodimentswhich do not provide all of the benefits and features set forth herein,are also within the scope of the invention. For example, intelligentdecomposition may be used to store objects even where multiple copies ofobjects are required according to lifecycle management policies orregulations. While some of the embodiments described herein providespecific details for implementation, the scope of the disclosure isintended to be broad and not limited to the specific embodimentsdescribed. Accordingly, details described in the specification shouldnot be construed as limitations of the claimed invention. Rather, thescope of the claims should be ascertained from the language of theclaims, which use terms consistent with their plain and ordinarymeaning.

1. A method of storing immutable data objects in a computer readablestorage, the method comprising: receiving a first original immutabledata object comprising data having a plurality of parts; determining atleast one logical boundary that divides the plurality of parts withinthe first original object; storing in a fixed content storage system atleast one of the divided parts of the first original object as a firstpart object, wherein the first part object is allocated an identifier;storing in the fixed content storage system a first master objectcomprising metadata that includes the allocated identifier to referencethe first part object of the first original object; receiving a secondoriginal immutable data object comprising data having a plurality ofparts; determining at least one logical boundary that divides theplurality of parts within the second original object; storing in a fixedcontent storage system at least one of the divided parts of the secondoriginal object as a second part object, wherein the second part objectof the second original object is allocated an identifier; storing in thefixed content storage system a second master object comprising metadatathat includes the allocated identifier to reference the second partobject of the second original object; determining the first part objectreferenced by the first master object has equivalent content to thesecond part object referenced by the second master object; repointingthe allocated identifier of the second part object to point to the firstpart object of the first original object, and removing the second partobject of the second original object.
 2. The method of claim 1, whereinthe first original object and the second original object comprise TARfiles, the part objects correspond to TAR data parts, and wherein theequivalent part objects correspond to identical TAR data parts.
 3. Themethod of claim 1, wherein a plurality of the divided parts of the firstoriginal object are stored as part objects, and the first master objectcomprises metadata that includes allocated identifiers to reference eachof the stored part objects.
 4. The method of claim 3, furthercomprising: determining at least one of the stored part objectsreferenced by the first master object has equivalent contents to atleast one of the other stored part objects referenced by the firstmaster object; repointing allocated identifiers of the stored partobjects that have equivalent content to point to one of the equivalentstored part objects, and removing stored part objects with noidentifiers pointing to them.
 5. The method of claim 1, wherein at leastone of the divided parts of the first original object is stored as apart object, with the remainder of the divided parts being stored in themaster object.
 6. A system configured to store immutable data objects ina computer-readable storage, the system comprising: a fixed contentstorage system comprising one or more storage devices; and one or moreprocessors in communication with the fixed-content storage system, theprocessors configured with executable instructions to perform a methodcomprising: receiving a first original immutable data object comprisingdata having a plurality of parts; determining at least one logicalboundary that divides the plurality of parts within the first originalobject; storing in the fixed content storage system at least one of thedivided parts of the first original object as a first part object,wherein the first part object is allocated a first identifier; storingin the fixed content storage system a first master object comprisingmetadata based at least in part on the first identifier; receiving asecond original immutable data object comprising data having a pluralityof parts; determining at least one logical boundary that divides theplurality of parts within the second original object; determining that asecond part object has equivalent content to the second part object, thesecond part object comprising at least one of the divided parts of thesecond original objects; and storing in the fixed content storage systema second master object comprising metadata based at least in part on thefirst identifier, whereby the fixed content storage system need notduplicatively store the second part object.
 7. The system of claim 6,wherein the first original object and the second original objectcomprise TAR files, the part objects correspond to TAR data parts, andwherein the equivalent part objects correspond to identical TAR dataparts.
 8. The system of claim 6, wherein a plurality of the dividedparts of the first original object are stored as part objects, and thefirst master object comprises metadata that includes allocatedidentifiers to reference each of the stored part objects.
 9. The systemof claim 8, wherein the method further comprises: determining at leastone of the stored part objects referenced by the first master object hasequivalent contents to at least one of the other stored part objectsreferenced by the first master object; repointing allocated identifiersof the stored part objects that have equivalent content to point to oneof the equivalent stored part objects, and removing stored part objectswith no identifiers pointing to them.
 10. The system of claim 6, whereinat least one of the divided parts of the first original object is storedas a part object, with the remainder of the divided parts being storedin the master object.
 11. A system of storing objects within a fixedcontent storage system, the system being configured to avoid storingduplicative content between multiple objects having equivalent parts,the system comprising: a fixed content storage system comprising adistributed network of computing nodes, the computing nodes comprisingone or more storage devices; and one or more processors present at oneor more of the computing nodes, the processors configured withexecutable instructions to perform a method comprising: receiving afirst object comprising data having a plurality of parts divided by oneor more logical boundaries; storing first metadata in the fixed contentstorage system, the first metadata comprising a reference to a firstpart object stored in the fixed content storage system, the first partobject comprising one or more of the plurality of parts of the firstobject; receiving a second object comprising data having a plurality ofparts divided by one or more logical boundaries; determining whether asecond part object is equivalent to the first part object, the secondpart object comprising one or more of the plurality of parts of thesecond object; and storing second metadata in the fixed content storagesystem, the second metadata comprising a reference to the part object ifthe second part object is equivalent, or a reference to the second partobject if the second part object is not equivalent, the second partobject being stored in the fixed content storage system if the secondpart object is not equivalent.
 12. The method of claim 11, wherein themethod further comprises storing the first part object in the fixedcontent storage system.
 13. The method of claim 11, wherein the methodfurther comprises storing, in the fixed content storage system, datacomprising the second part object, prior to the determination that thesecond part object is equivalent to the first part object.
 14. Themethod of claim 12, wherein the method further comprises deleting, fromthe fixed content storage system, data comprising the second partobject, based on the determination that the second part object isequivalent to the first part object.
 15. The method of claim 11, whereinstoring the first metadata in the fixed content storage system comprisesstoring a first master object comprising the first metadata, and whereinstoring the second metadata in the fixed content storage systemcomprises storing a second master object comprising the second metadata.16. The method of claim 11, wherein the processor is further configuredto repeat the step of determining whether a second part object isequivalent to the first part object, for all of the parts of the secondobject.